An open source framework for end-to-end
re-forecast studies in GSI-WRF-MET

By Colin Grudzien

With thanks to:

Minghua Zheng, Ivette Hernández Baños, CW3E’s USAF Group and CW3E’s AR-Recon Team for important discussions about the GSI-WRF-MET stack and sharing library implementation details.

With very special thanks to:

Caroline Papadopoulos, Rachel Weihs, Christopher Harrop, CW3E’s West-WRF Team / NRT Team and CW3E’s Forecast Verification Team for sharing code that formed the basis of these workflows
Center for Western Weather and Water Extremes.

Introduction

  • I am a data assimilation research scientist and scientific software developer at the Center for Western Weather and Water Extremes (CW3E).

  • One of CW3E's core programs is known as the Atmospheric River Reconnaissance (AR-Recon) program.

    • This is a research and operations partnership to fill the gaps in the observations of AR storm genesis and evolution in order to improve their predictability.
  • AR-Recon is led by CW3E and the NWS/NCEP, with core partners including multiple academic institutions and:

    • the U.S. Navy;
    • the U.S. Airforce;
    • the National Center for Atmospheric Research (NCAR); and
    • the European Centre for Medium Range Weather Forecasts (ECMWF).
  • I am a co-PI on projects in collaboration with the U.S. Airforce, leading efforts for the development and adoption of the novel Joint Effort for Data Assimilation Integration (JEDI) and Model for Prediction Across Scales (MPAS) framework.

  • While the JEDI Framework will be adopted for operations in the future, it is still currently in a state of rapid development and extensive testing.

    • Much of the research in JEDI currently is about JEDI itself, understanding / improving its functionality and bringing it to the skill of legacy systems.
  • However, there is currently a major gap in open source software for operating legacy data assimilation systems for benchmark studies.

Data assimilation as a workflow process

  • CW3E currently has an operational near-real-time ensemble forecast system, and has been developing and experimental data assimilation re-forecast system based on the ensemble-forecast system.
  • Data assimilation is a notoriously complex operational problem, which involves interdependent steps of:
    • procuring;
    • pre-processing;
    • generating; and
    • post-processing;
    • large volumes of NWP data.
  • In order to scale to operation, data assimilation typically requires the use of a workflow manager to interface between the user’s tasks and the super computer’s job scheduler.
  • From Christopher Harrop (creator of Rocoto):
  • Workflow Management is a concept that originated in the 1970’s to handle business process management… to manage complex collections of business processes that need to be carried out in a certain way with complex interdependencies and requirements
    …scientific workflows are driven by the scientific data that “flows” through them… usually triggered by the availability of some kind of input data, and a task’s result is usually… fed as input to another task in the workflow.
  • The complexity of data assimilation cycling is demonstrated in the following data flow diagram

Data assimilation in a GSI-WRF-MET Stack

Diagram of data flows in GSI-WRF-cycling.

Data assimilation as a statistical learning problem

  • My own methodological framework of analyzing the data assimilation problem is of a statistical learning problem.

  • I need to run many simulations to study:

    • hyper-parameter sensitivity in the learning problem; and
    • to generate a statistically relevant sample size for hypothesis testing and/or Bayesian modelling techniques.
  • My re-forecasting workflows are also non-standard from the perspective of operational forecasting;

    • rather than generating, e.g., a 10-day forecast at every zero hour, I need to run a forecast up to a specific valid time for verification, with varying forecast length to optimize resources.
  • I also perform simulations on multiple HPC platforms with different system architectures, job schedulers and software stacks, so I need to keep my software as portable and system-agnostic as possible.

  • These demands have led me to develop an experimental end-to-end data assimilation cycling system in the GSI-WRF-MET stack using the Rocoto Workflow Manager.

    • This builds principally on CW3E's NRT and Verification Teams' Rocoto / MET workflows, that provided the basis for this framework.
    • Christopher Harrop shared open source workflow scripts for all WRF steps, and templates for GSI integration, that are used currently at CW3E for its operational NRT products.
    • As a byproduct of research efforts, I have integrated these codes into a unified system for case-study analysis, using a Bash / Python data science stack.

CW3E’s Open Source Institutional Code Base

CW3E Github.
  • These code repositories are available on the CW3E Github.
  • Code is licensed for reuse, redistribution and modification under the Apache 2.0 Open Source License.
  • This is a permissive license that has conditions familiar to an academic environment:
    • original authors are given attribution; and
    • redistributed code includes change logs.
  • Derived products do not need to be made open source;
    • this can be reused for nearly any purpose by downstream users.
  • This makes these codes suitable for controlled information environments, while still benefiting from community interaction.

Data assimilation in a GSI-WRF-MET Stack

  • Includes an IPython API for workflow commands, and plotting results in Matplotlib.
    • This is designed to enable easy looping of:
      • valid dates,
      • task indices,
      • control flows, and
      • case studies,
    • as demonstrated to the right:
Ipython API.
  • NOTE: there is no documentation (outside of comments);
  • Conversion to Cylc is necessary for:
    • long-term support, and for
    • developing a unified system for re-forecasting with formally “equivalent” experiments in GSI-WRF and JEDI-MPAS with verification performed on common metrics in MET.
  • This intends to build on the open source JEDI-MPAS workflow templates developed at NCAR.

Data assimilation in a GSI-WRF-MET Stack

  • Experiment settings are currently configured using an XML file to instruct Rocoto as below:
XML file controlling workflow.
  • However, this will be replaced with an analogous YAML file for the Cylc re-write.

Data assimilation in a GSI-WRF-MET Stack

Diagram of data flows in GSI-WRF-cycling.

Data assimilation in a GSI-WRF-MET Stack

  • In the Rocoto logs, this workflow is presented in cycle order as in the following:
Rocoto logs for GSI-WRF-cycling.
  • As a current limitation, new cycles are triggered with the ad-hoc boot_next_cycle task.
  • This ad-hoc job was created due to cycles not triggering in Rocoto workflows containing complex, inhomogeneous cycles.
  • This issue is planned to be fixed in the Cylc re-write.

Data assimilation in a GSI-WRF-MET Stack

  • However, this currently allows the mixing of asynchronous jobs in heterogeneous cycles, with specific settings depending on the forecast zero-hour.
Rocoto logs for GSI-WRF-cycling.
  • For a verification at a specified valid date of 2019-02-15_00_00, the above cycle needs a 4-day forecast.
  • However, the following cycle will only require a 3-day forecast.
Rocoto logs for GSI-WRF-cycling.
  • For offline, re-forecast studies, this allows significant performance optimization of the workflow, eliminating the need to call unnecessary resources.

Data assimilation in a GSI-WRF-MET Stack

  • Each of these tasks is called as a scripted job, with run-time settings defined by the cycling workflow.

    • These scripts are workflow manager agnostic, as they simply require one's workflow to export Bash variables at run-time.
    • These scripts have been homogenized to handle the data flows with switches to determine sources of data.
  • E.g., the WRF driver script, wrf.sh, runs differently depending on whether it is:

    • a fresh forecast down-scaling a global background model;
    • a forecast with initial conditions generated from a data assimilation cycle; or
    • a restart run from a previous 6-hour, 1-way nested run, generating an extended forecast.
  • Currently, experiments are organized in a case study / control flow hierarchy:

    • one will name the case study under consideration, and
    • define a control flow name with all tuned settings, e.g., for the data assimilation method, WRF tunings, etc.
  • All associated settings are written to a static directory which is sourced by the Rocoto workflow;

    • this is designed so that all tuned settings are centrally archived for case study / configuration reproducibility.

MET Verification of 2022-2023 AR Season

  • MET forecast verification is a key aspect of this workflow used to objectively assess forecast skill.

  • We shift focus now to the MET verification tools in this workflow with the 2022 - 2023 AR Season as a highly applicable use-case for these codes.

  • This is a perfect example of where one needs to analyze data from multiple:

    • valid dates;
    • forecast leads;
    • models; and
    • sub-domains.
  • Batch processing all this data is a non-trivial task, as a workflow should:

    • handle maps of hyper-parameter configurations,
    • systematically organize outputs,
    • be robust to missing data in the workflow stream,
    • perform error checks and exception handling,
    • generate traceable logs for debugging,
    • automate plotting axes / labels / legends, and
    • require minimal hard-coded configurations.
  • This problem arises frequently with tuning the GSI-WRF-Cycling-Template.

Output Directory Organization

Output directory tree.
Output directory tree.
Output directory tree.
Output directory tree.

Multidate / Multilead Heatplot Templates

CW3E Github.
CW3E Github.
  • Grid-Stat outputs are parsed, filtered and converted into Pandas dataframes for plotting and diagnostics.
  • Templates are designed for multidate / multilead heatplots for diagnostics over ranges of valid dates.
  • Missing / erroneous data is filtered by the workflow post-processing routines.

Multi-lead Lineplot Templates

CW3E Github.
CW3E Github.
  • Templates are designed for multilead / multimodel / multidomain lineplots for diagnostics for particular valid dates.
  • Axes scale depending on the available forecast data, and plot configurations with different lead hours together.
  • These templates are designed for batch analysis of large hyper-parameter grids.
  • New tools and templates are under continuous development, with research tools polished for re-use incrementally.

Conclusions

  • This is an ongoing research-to-operation project at CW3E, to generate open source tools for both purposes.

  • This benefits from community interaction, especially with collaboration with the NRT and Verification Teams on on workflow and research tool development.

    • However, this also supports scientific objectives for specific projects that are performed in controlled information environments.
  • As such, this infrastructure project presents opportunities for wider collaboration where scientific dissemination may be restricted.

    • This infrastructure talk highlights some of the benefits and flexibility of an open source model for infrastructure development.
  • Folks who are interested in collaborating on these tools are encouraged to reach out,

    • more users of the same code leads to more bug-fixes and development scalability.
  • A late colleague of mine from when I was a postdoc in Norway, Yongqi Gao, liked to reference a proverb that is applicable to open source development:

    “If you want to go fast, go yourself. If you want to go far, go together”.
Thank you!